Corpus: vls_wikipedia_2021_10K

Other corpora

4.4.1.5 Number of Word-N-grams at Sentence Endings

Number of word-N-grams for N=1...5 for the first K sentences

K # of words # of bigrams # of trigrams # of 4-grams # of 5-grams
100 97 99 99 99 99
1000 854 988 998 999 999
10000 6117 8914 9511 9636 9667
100000 6117 8914 9512 9637 9668
1000000 6117 8914 9512 9637 9668


Zipf's diagram for sentence endings


Gnuplot diagram

1731 msec needed at 2021-06-28 01:01